Non-Verbatim Copyright Infringement Detection for Text
نویسنده
چکیده
The Problem: Copyright infringement is a serious problem which threatened authors of creative works even in the non-electronic world. In the electronic world, easy access to electronic documents and the ease of reproducing and distributing these documents have made copyright infringement an even bigger problem. In general, the ease of reproducing and distributing online documents threatens not only the economic interests of authors but also the growth of the Internet as an online resource.
منابع مشابه
کاوشی در چالش های ناشی از آثار گمنام
Orphan works are defined as copyrighted works for which the copyright holder cannot be identified or located. People who wish to use that work as the basis for creating new work–such as a digital archives, compilation, or adaptation–feel unable to do so due to the risks associated with copyright infringement. If they can’t find a person to get permission and go ahead with thei...
متن کاملSource Code Authorship Attribution using n-grams
Plagiarism and copyright infringement are major problems in academic and corporate environments. Existing solutions for detecting infringements in structured text such as source code are restricted to textual similarity comparisons of two pieces of work. In this paper, we examine authorship attribution as a means for tackling plagiarism detection. Given several samples of work from several auth...
متن کاملCapturing Expression Using Linguistic Information
Recognizing similarities between literary works for copyright infringement detection requires evaluating similarity in the expression of content. Copyright law protects expression of content; similarities in content alone are not enough to indicate infringement. Expression refers to the way people convey particular information; it captures both the information and the manner of its presentation...
متن کاملMeasuring Text Reuse in a Journalistic Domain
This paper describes a general framework for measuring text reuse. This term is used to describe how content from a single or multiple number of known sources can be reused either verbatim (word-for-word copy) or otherwise rewritten depending upon factors influencing the creation of a new document. These may include reduction/ increase in length, change of style, simplification of content, shif...
متن کاملPostgraduate Transfer Report.PDF
This thesis builds upon our current understanding of text reuse by proposing a hypothetical framework of text reuse and applying this abstract definition to a specific domain, that of journalistic reuse. The framework aims to explore a suitable measure of reuse and determine suitable discriminators for document derivation. Although text can be reused verbatim (word-for-word), in most cases, tex...
متن کامل